Student team: NO
CORE is a web application framework written at PNNL primarily by me using open source software like MySQL and PHP that allows users to easily interface with underlying data and define different types of views easily. Pulling data into CORE tables is easily done, and defining the views of the data is done with a simple interface that allows for complex joins and styles. Views can be maps, timelines, and tables, and can be editable, allowing the user to change the data if necessary. The views can interact with each other, giving the user the ability to see a map and table and timeline side by side, and all viewing the same data in different ways.
ProSPECT is a tool written at PNNL primarily by me in Java for analysts that aids them in their process from the collection and sifting through large sets of data through the marshalling and evidence extraction, to the creation and analysis of a hypothesis. Data is accessed through plugins, which define how the data is searched and viewed, then normalized and visualized in the marshalling space, where information from multiple data sources can be pulled together freely. In this space relationships are discovered through co-occurrence in documents or records.
Two Page Summary: NO
Phone-1: What is the Catalano/Vidro social network, as reflected in the cell phone call data, at the end of the time period
Note: These two files contain all linkages between all nodes, and a list of all nodes. Those nodes that I consider important in the network have unique names. Others are simply left as Unknown. There were no nodes or groups of nodes that were independent within the group.
Phone-2 Characterize the changes in the Catalano/Vidro social structure over the ten day period.
Detailed Answer:
If 200 is indeed Ferdinando Catalano, then the people with whom he interacts most are 1, 2, 3, and 5. He communicates most with 5, so we can suspect that Esteban is 5. This network is very heirarchical in parts, especially off of node 5. It seems 5 is more like a coordinator. If 5 is David Vidro, though, he's not calling either of the other Vidros (2 and 3), and Ferdinand isn't calling any of the others more than he's calling 5. So it seems more plausible then that 5 is Esteban, who is actually the one coordinating activities. Therefore, 1,2, and 3 must be Vidros, and 5 must be Esteban Catalano. There are no direct links between Esteban and any of the Vidros at any point except for a single interaction between 1 and 5. Also, only a few of the poeple linked off of 1, 2, or 3 can easily be linked to the heirarchy emanating from 5. 1,2, and 3, then are not considered to be part of the important network.
To address the geography of this task, I split the island into three natural regions, and colored the cell towers in my views accordingly. This helped immensely when analyzing the players. The people with whom 5 interacted either stayed in the same region the entire time, or bounced frequently between two regions. Dividing the island into regions made it easier to identify these bounces and represented larger and more deliberate movements.
One important shift in the structure is from 6/07 to 6/08. A large number of people were calling 5 before the shift. The next day those same people were calling 306 instead. Oddly, the network for 5 appears to be nearly identical to 306, so one possibility is that the same person switched phones. Another possibility is that tasks switched from 5 to 306. From now on I'll assume they are the same person and refer to both 5 and 306 as 5. The people who called both numbers are interesting:
List A:
6,10,11,21,22,24,25,26,28,37,77,97,105,107,113,114,147,150,155,157,180,184,189,194,201,220,246,267,314,316,326,348,366,383.
This is a huge set, but it's significantly smaller than our original 400. After creating a view in CORE (see challenge 1 and 2 for how CORE is used to create views of data) showing phone calls from these users and sorting by date and grouping by person, some patterns emerged. Many people called from the same region all the time, but there were some people who moved between regions on the island, and these were highly mobile people, sometimes moving back and forth between regions a couple times during a day:
This list of calls from person 184 indicate that they were bouncing between the North and Central regions of the island. Notice also that 184 usually calls 49 from tower 13 and 5 from tower 2. List B is the subset of A that is bouncers.
List B:
10, 11, 24, 37, 113, 114, 184, 194, 220, 267, 316, 326, 348, 366.
One thing that's quickly visible with some of these bouncers is that their calls have a pattern; they will often call primarily two other people:
List C:
11-> 5,104
114-> 5, 87
184-> 5, 49
194-> 5, 22, 291
220-> 5, 186, 277
267-> 5, 231, 387
314-> 5, 250
316-> 5, 52
326-> 5, 90
348-> 5, 163
366-> 5, 30
383-> 5, 92
With a list that includes the recipients of list C above, we can switch the view so that we see who is calling these people. For most of them, there's a single person or a few people who call them daily:
List D:
49<- 210, 252, 370
52<- 76, 143, 299, 343
87<- 255, 294
90<- 204, 385, 399
92<- 20, 119, 228, 286, 290, 377
163<-56
231<- 47, 267
291<-204
387<- 69
Thus we have a level of people (Level 4) that call an upper level (Level 3) every day. We also have a level of people (Level 2), the bouncers, who frequently call the Level 3 people. And the Level 2 bouncers all report to Level 1, who is #5:
Next I brought in another tool I've developed called ProSPECT. This tool allows us to write a plugin to access the data, then use that data source to create a network of nodes. The nodes can then be linked by co-occurrence, with the size of the link representing the number of co-occurrences. Thus, people that talk more frequently have thicker lines. Using ProSPECT, you can approach a graph one of two ways; you can start by showing everything:
This is clearly a mess and doesn't have any meaningful way of showing the important parts of the network. The other approach is to start with something you know and see what is related to it. Here, I started with 200, and saw which items were related to it. Then I added the ones that I wanted, and found the relationships between all the nodes. You can continue to do this, building up the graph with the nodes you want, and always aware of the context of the graph and understanding its structure from the beginning. The included video shows how I used ProSPECT to search the cell records using a plugin, then use those records to generate a graph of the network.
Looking at the nodes we identified already, we can see that the structure is essentially as described. There are some extra links indicating calls that did not match the network perfectly, but this is hardly surprising. What is important is that the general structure is as I discovered through CORE:
Summary
In summary, the network has four levels, with each level performing a different function. In exploring the links out of 1, 2, and 3 (below), I did not find any interesting networks. It is the network off of #5 that is most intriguing, and the network performs regular operations as it operates.